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O ■ Abstract 
(N 

C , We propose a factor-graph-based approach to joint channel-estimation-and-decoding (JCED) of bit- 

interleaved coded orthogonal frequency division multiplexing (BICM-OFDM). In contrast to existing 
' designs, ours is capable of exploiting not only sparsity in sampled channel taps but also clustering among 

the large taps, behaviors which are known to manifest at larger communication bandwidths. In order to 
exploit these channel-tap structures, we adopt a two-state Gaussian mixture prior in conjunction with a 
O . Markov model on the hidden state. For loopy belief propagation, we exploit a "generaUzed approximate 

message passing" (GAMP) algorithm recently developed in the context of compressed sensing, and 
K*- ' show that it can be successfully coupled with soft-input soft-output decoding, as well as hidden Markov 

^SJ , inference, through the standard sum-product framework. For N subcarriers and any channel length L < 

.,— j. . TV, the resulting JCED-GAMP scheme has a computational complexity of only 0{N log^ N + N\E>\), 

where |§| is the constellation size. Numerical experiments using IEEE 802.15.4a channels show that our 
scheme yields BER performance within 1 dB of the known-channel bound and 3-4 dB better than soft 
equalization based on LMMSE and LASSO. 

X 

H I. Introduction 

When designing a digital communications receiver, it is common to model the effects of multipath 
propagation in discrete time using a convolutive linear channel that, in the slow-fading scenario, can 
be characterized by a fixed impulse response {xj}^~Q over the duration of one codeword. When the 
communication bandwidth is sufficiently low, the "taps" {xj}^~Q are well modeled as independent 
complex Gaussian random variables, resulting in the "uncorrelated Rayleigh-fading" and "uncorrelated 
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Rician-fading" models that have dominated the wireless communications literature for many decades 
|[T1. For receiver design, the Gaussian tap assumption is very convenient because the optimal estimation 
scheme is well known to be linear IJl. As the communication bandwidth increases, however, the channel 
taps are no longer well-modeled as Gaussian nor independent. Rather, they tend to be heavy-tailed or 
"sparse" in that only a few values in {xj}^^Q have significant amplitude Il3l-|l6l. Moreover, groups of 
large taps are often clustered together in lag j. These behaviors are both a blessing and a curse: a blessing 
because, of all tap distributions, the independent Gaussian one is most detrimental to capacity iH, but 
a curse because optimal channel estimation becomes non-Unear and thus receiver design becomes more 
complicated. 

Recently, there have been many attempts to apply breakthrough non-linear estimation techniques from 
the field of "compressive sensing" fSl (e.g., LASSO 111, ifTOl ) to the wireless channel estimation problem. 
We refer to this approach as "compressed channel sensing" (CCS), after the recent comprehensive 
overview ifTTl . The CCS literature generally takes a decoupled approach to the problem of channel 
estimation and data decoding, in that pilot-symbol knowledge is first exploited for sparse-channel esti- 
mation, after which the channel estimate is used for data decoding. However, this decoupled approach is 
known to be suboptimal |[T2l . 

The considerations above motivate a joint approach to structured-sparse-channel-estimation and de- 
coding that offers both near-optimal decoding performance and low implementation complexity. In this 
paper, we propose exactly such a scheme. In particular, we focus on orthogonal frequency-division 
multiplexing (OFDM) with bit-interleaved coded modulation (BICM), and propose a novel factor-graph- 
based receiver that leverages recent results in "generalized approximate message passing" (GAMP) |[T3l . 
soft-input/soft-output (SISO) decoding | |143 . and structured-sparse estimation ifTSl . Our receiver assumes 
a clustered-sparse channel-tap prior constructed using a two-state Gaussian mixture with a Markov model 
on the hidden tap state. The scheme that we propose has only C>{N log2 N + N\S\) complexity, where 
N denotes the number of subcarriers and |S| denotes the constellation size, facilitating large values of 
N and channel length L < N (e.g., we use N = 1024 and L = 256 for our numerical results). For 
rich non-line-of-sight (NLOS) channels generated according to the IEEE 802.15.4a standard [TQ, our 
numerical experiments show bit error rate (BER) performance within 1 dB of the known-channel bound 
and 3-4 dB better than soft equalization based on LMMSE and LASSO. 

We now place our work in the context of existing factor-graph designs. Factor-graph based joint 
channel-estimation and decoding (JCED) was proposed more than a decade ago (see, e.g., the early 
overview 1171 ). To calculate the messages passed among the nodes of the factor graph, first instincts 
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suggest to apply the standard "sum-product algorithm" (SPA) |[T8l - ll20l . Exact SPA on the JCED factor 
graph is computationally infeasible, however, and so it must be approximated. For this, there are many 
options, since many well-known iterative inference algorithms can themselves be recognized as SPA 
approximations, e.g., the expectation-maximization (EM) algorithm |[2ll . particle filtering |[22l . variational 
(or "mean-field") techniques |[23l . and even steepest descent |[24l . Moreover, because the JCED factor 
graph is loopy, even non-approximate SPA is not guaranteed to yield the correct output distributions, 
because exact inference is NP hard |[25l . It is perhaps not surprising that, amidst this uncertainty about 
exact SPA and its "best" approximation, a number of different factor-graph approaches to JCED over 
frequency-selective channels have been proposed (e.g., |[26l - |[29l ). 

Our approach differs from existing factor-graph JCED designs in that it uses 1) a sparse (i.e., non- 
Gaussian) channel-tap prior, 2) a clustered (i.e., non-independent) channel-tap prior, and 3) a state-of-the- 
art SPA approximation known as "generalized approximate message passing" (GAMP), which has been 
shown to admit rigorous analysis as A^, L— )-oo |[T3l . In fact, we conjecture that the success of our method 
is due in large part to the principled approximations used within GAMP. We also note that, although 
we focus on the case of clustered-sparse channels, our approach could be applied to non-sparse (i.e., 
Gaussian) or non-clustered (i.e., independent) channel-taps or, e.g., non-sparse channels with unknown 
length L |T26l, with minor modifications of our assumed channel prior. 

Finally, we mention that this work is an evolution of our earlier work ll30ll . lISTl that was limited to 
an exactly sparse channel, that did not exploit clustering, and that was based on the "relaxed belief 
propagation" (RBP) algorithm |[32l . which has higher implementation complexity than GAMP. For 
example, the JCED scheme from |[30l . IIBTI has complexity 0{N L+N\^\), which grows with the channel 
length L. 

Our paper is organized as follows. In Section HIl we detail our assumptions on the OFDM system and the 
channel prior, and provide an illustrative example of clustered-sparse behavior with the IEEE 802.15.4a 
channel model. In Section |lll] we detail our GAMP-based JCED approach, in Section |IV] we report the 
results of our simulation study, and in Section |V] we conclude. 

Throughout the paper, we use the following notation. M denotes the field of reals and C the complex 
field. (•)* denotes conjugate and Re(-) extracts the real part. Furthermore, 5{t) denotes the Dirac delta 
waveform while denotes the Kronecker delta sequence. Also, {j)^ denotes j-modulo-A^, * 

convolution, and oc denotes equality up to a scaling. We use boldface capital letters like B to denote 
matrices and boldface small letters like h to denote vectors. / denotes the identity matrix, 1 denotes 
the vector of ones, and V{h) constructs a diagonal matrix from the vector h. For matrices and vectors. 
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(•)^ denotes transpose and {■)^ denotes conjugate transpose. When xj is a realization of random variable 
Xj, we write and use Ex^ixj} to denote the mean, YarXj{xj} the variance, pXj{xj) the pdf, 

and PXj\Dj{xj I dj) the pdf conditioned on the event Dj = dj. Sometimes we omit the subscript when 
there is no danger of confusion, yielding, e.g., E{xj}, varjxj}, p{xj) and p{xj \dj). CJ\f{x; x,u^) = 
(vrz/^)"^ exp(— |rE — xp/z^^') denotes the circular Gaussian pdf with mean x and variance u^. In fact, we 
often use [vj, Uj) when referring to the mean and variance of Vj. For a random vector x, we use Cov(a;) 
to denote the covariance matrix. 

II. System Model 

A. Tlie BICM-OFDM model 

We consider an OFDM system with N subcarriers, each modulated by a QAM symbol from a 2^^- 
ary unit-energy constellation S. Of the N subcarriers, A'^p are dedicated as pilotsQ and the remaining 
N(i = N — Np are used to transmit a total of Mt training bits and = N^M — Mx coded/interleaved 
data bits. The data bits are generated by encoding M\ information bits using a rate-i? coder, interleaving 
them, and partitioning the resulting Mq = M\/R bits among an integer number Q = Mq/M^ of OFDM 
symbols. We note that the resulting scheme has a spectral efficiency of ri = M^R/N information bits per 
channel use (bpcu). 

In the sequel, we use s^'^'^ € S for A; € {1, ... , 2*^} to denote the k^^ element of the QAM constellation, 
and c^^^ = [c"i \ ■ ■ ■ , c^mV denote the corresponding bits as defined by the symbol mapping. Likewise, 
we use Si[q] G S for the QAM symbol transmitted on the i^^ subcarrier of the q^^ OFDM symbol 
and Ci[q] = [cj^i[g], . . . , Cj^i/l?]]^ for the coded/interleaved bits corresponding to that symbol. We use 
c[q] = [cQ[q], . . . ,cx~i[q]Y to denote the coded/interleaved bits in the g*'' OFDM symbol and c = 
[c[l], . . . , c[Q]]^ to denote the entire (interleaved) codeword. The elements of c that are apriori known 
as pilot or training bits will be referred to as Cpt. The remainder of c is determined from the information 
bits fo= . . . , ftAfJ^ by coding/interleaving. 

To modulate the q^^ OFDM symbol, an -point inverse discrete Fourier transform (DFT) is applied 
to the QAM sequence s[q] = [so[(?], • • • , SAr_i[q']]^, yielding the time-domain sequence = a[q] = 

[ao[g], . . . ,aAr_i[g]]^. The OFDM waveform a{t) is then constructed using L-cyclic -prefixed versions of 

' For our GAMP decoder, we recommend A'p — 0; see Section IIVI 
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{aj[g]} and the transmission pulse gtir): 

Q N-1 

= E E «0)«[^] 9t{t-jT-q{N + L)T), (1) 

with T denoting the baud interval (in seconds) and L < N. 

The waveform a{t) propagates through a noisy channel with an impulse response /i(r) that is supported 
on the interval [rmin, Tmax]> resulting in the receiver input waveform 

r(t) = w{t) + J h{T)a{t - T)dT, (2) 

where w{t) is a Gaussian noise process with flat power spectral density A'^r,. We note that a time-invariant 
channel is assumed for simplicity. The receiver samples r{t) through the reception pulse grir), obtaining 

rj[q]= j r{t)gr{jT + q{N + L)T-t)dt, (3) 

and applies an iV-DFT $ to each time-domain sequence r[q] = [ro[g], . . . , tat^i [(7]]^, yielding the 
frequency-domain sequences ^r[q]=y[q] = [j/oM, • • • , yA^-iM]^ for g = 1 . . . Q. 

Defining the pulse-shaped channel response x{t) = {gr-k h-k g\){T), it is well known (e.g., ||33l ) that, 
when the support of x(t) is contained within the interval [0,LT), the frequency domain observation on 
the i^^ subcarrier can be written as 

yi[Q] = + Wi[q], (4) 

where Zi[q] € C is the i^^ subcarrier's gain and {u;i[(7]} are Gaussian noise samples. Furthermore, defining 
the uniformly sampled channel "taps" Xj[q]=x{jT+q{N+L)T), the subcarrier gains are related to these 
taps through the DFT: 

L-l 
j=0 

In addition, when (^r * 5't)(''") is a Nyquist pulse, {wj[g]}vi,g are statistically independent with variance 

iy'^ = No. 

To simplify the development, we assume that Q = 1 in the sequel (but not in the simulations), and 
drop the index [q] for brevity. 

B. A clustered-sparse tap prior 

Empirical studies ||3l-||6l have suggested that, when the baud rate is sufficiently large, the channel 
taps {xj} are "sparse" in that the tap distributions tend to be heavy tailed. The same empirical studies 
suggest that large taps tend to be clustered in the lag domain. Furthermore, both the sparsity and clustering 
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behaviors can be lag-dependent, such as when the receiver's timing-synchronization mechanism aligns 
the first strong multipath arrivals with a particular reference lag j. A concrete example of these behaviors 
will be given in Section III-CI 

Since our message-passing-based receiver design is inherently Bayesian, we seek a prior on the taps 
{xj} that is capable of representing this lag-dependent clustered sparsity. For this purpose, we assume a 
two-state Gaussian mixture (GM2) priori 

p{xj) = (1 - Xj)CM{xj; 0, 1/°) + XjCM{xj; 0, i/j), (6) 

where i^j > denotes the variance while in the "small" state, uj > Uj denotes the variance while in the 
"big" state, and Xj = Pr{dj = 1} denotes the prior probability of Xj being in the "big" state. Here, we 
use dj G {0, 1} to denote the hidden state, implying the state-conditional pdf p{xj \ dj) = CAf{xj; 0, i^j^)- 

For example, if Xj was presumed to be a "sparse" tap, then we would choose Xj <^ 1 and uj ^ Uj in 
If, on the other hand, Xj is presumed to be (non-sparse) Rayleigh-fading, we would choose Xj = 1 
and set uj equal to the tap variance, noting that Uj becomes inconsequential. If Xj is presumed to be 
Nakagami-fading or similar, we could fit the GM2 parameters [Xj,iJj,uj] appropriately using the EM 
algorithm, as described in 1341 p. 435]. The GM2 prior has been used successfully in many other non- 
Gaussian inference problems (see, e.g., |[35l ). and our premise here is that the GM2 model achieves a 
good balance between fidelity and tractability when modeling channel taps as well. 

To capture the big-tap clustering behavior, we employ a hidden Markov model (HMM). For this, we 
model the tap states {dj}^^Q as a Markov chain (MC) with switching probabilities p'j^ = Prjdj+i = 
0\dj = 1} and pj^ = Prjdj+i = l\dj = 0}. Here, p^^ < 0.5 implies that the neighbors of a big Xj 
tend to be big, and p^^ < 0.5 implies that the neighbors of a small Xj tend to be small. We note that 
{p'j^ iP]'^}j'Zo must be consistent with {Xj}^~Q in that the following must hold for all j: 

'l-pf pf 



A, 1-A, 



pf l-pf 



(7) 



Although we allow correlation among the tap states, we assume that the tap amplitudes are conditionally 
independent, i.e., p{xjj^i,Xj \ d^+i, dj) =p{xj \ dj)p{xj^i \ dj^i). Our experiences with IEEE 802.15.4a 
channels (see below) suggest that this is a valid assumption. 

We emphasize that the model parameters {Aj,pj^,p°^, z^j, Uj} are allowed to vary with lag j, facilitating 
the exploitation of apriori known lag-dependencies in sparsity and/or clustering. 

^ The message passing algorithm described in Section IIII-BI can also handle non-Gaussian mixtures and/or mixtures with 
more than two terms. 
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C. An illustrative example: IEEE 802.15.4a channels 

As an illustrative example of the clustered-sparse tap behavior described above, we generated realiza- 
tions of the tap vector x = [xq, . . . ,Xi_i]^ from channel impulse responses /i(r) generated according 
to the method specified in the IEEE 802.15.4a "ultra-wideband" standard |[T6l . which uses the Saleh- 
Valenzuela model lf36ll 

C K 

^(^) =Y.Y. hk,ce^'^''^^S{T - - Tk,c), (8) 

c=0 k=0 

where C denotes the number of clusters, Tc the delay of the c*'^ cluster, K the number of components per 
cluster, {rfc c} the relative component delays, {/ifc.c} the component amplitudes, and {ipk^c} the component 
phases. In particular, the 802.15.4a standard specifies the following. 

• The cluster arrival times are a Poisson process with rate A, i.e., p{Tc \ Tc_i) = Aexp(— A(Tc— Tc_i)). 
The initial cluster delay Tq > r^m, as seen by the receiver, is a function of the timing synchronization 
algorithm. 

• The component arrivals are a mixture of two Poisson processes: p(Tfc ckfc-i,c) = /5Ai exp(— Ai(rfc 
Tfe-i,c)) + (l - /5)A2 exp{-X2{Tk,c - rfc-i,c)) with ro,c = 0. 

• The component energies obey 

T,n, |2; _ exp(-r;/r - Tk,l/j) 

7[(1 - /3)Ai + /?A2 + IJ 
where T is the cluster decay time constant and 7 is the intra-cluster decay time constant. 

• The amplitudes {hk^c} are i.i.d Nakagami with jn-factors randomly generated via i.i.d m ~ 

Af{mo,ml). 

• The phases {(t)k,c} are i.i.d uniform on [0,2it). 

• The number of clusters, C, is Poisson distributed with mean C, i.e., p(C) = (C)*^ exp(— C')/(C!). 

• The number of components per cluster, K, is set large enough to yield a desired modeling accuracy. 

Beyond the above specifications, we assume the following. 

• The parameters {A, Ai, A2, /3, F, 7, mo,mQ,C} are set according to the 802.15.4a "outdoor NLOS" 
scenario |[T6l . 

• K = 100 components per cluster are used. 

• The pulses g\{T) and gr{T) are square -root raised cosine (SRRC) designs with parameter 0.5. 

• The system bandwidth equals T"^ = 256 MHz. 

• The number of taps (and CP length) was set at L = 256 (implying a maximal delay spread of 
1 vsec) in order to capture all significant energy in h{T). 
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• The initial delay was generated via Tq = Lpr^T + Tq, where Lpre = 20 and where Tq is exponentially 
distributed with mean T, i.e., p(Tb) = Aq exp(— AqTo) for Ao = l/T. Here, Lpre was chosen so that 
{xj}j^Q captures the "pre-cursor" energy contributed by the pulse shape, while Aq models a positive 
synchronization uncertainty. 

We now show results from an experiment conducted using U = 10000 reahzations of the tap vector 
X. In Fig. [T] we show histograms of R,e{xj) for lags j G {5,23, 128,230}. There it can be seen that 
the empirical distribution of Re(xj) changes significantly with lag j: for pre-cursor lags j < Lpre, it is 
approximately Gaussian; for near-cursor lags j Lpre, it is approximately Laplacian; and, for post-cursor 
lags j ^ LprQ, it is extremely heavy-tailed. In Fig. |2j we show a typical realization of x and notice 
clustering among the big taps. For comparison, we also plot an empirical estimate of the power-delay 
profile (PDP) p = [po, . . . , in Fig. |2l where pj = E{\xj\^}. 

Next, we fit the GM2 parameters {Xj, Uj, i'j}j~Q from the realizations {xu}u=i using the EM algorithm 
ll34l p. 435], which iterates the steps ([T0l)- ([T3] ) until convergence: 

'^■?> ~ (l-A,)CAr(xj,„;0,i^»)+AjCA^fe,„;0,i'|) 

= E«=i ^j,u I Xj,u I V E«=i ^j,u vj (11) 
^° = Eu=ii^-^j,u)Ku\VEu=ii^-^j,u) vj (12) 

= ljT.u=i^j,u Vj. (13) 

Above, ujj^u is the posterior on the state dj^u of tap Xj^u, i-e-> ^j,u = Pr{c^j,ti = ^\xj^u] 
The EM-estimated big-variance profile = [u^, . . . , and small- variance profile are shown in 

Fig. |2j while the sparsity profile A = [Aq, • • • , Ai_i]^ is shown in Fig. |3] Not surprisingly, the best-fit 
GM2 parameters also change significantly with lag j. In particular, as j becomes larger, the variance 
ratio i^j /i^j increases while the big-tap-probability Xj decreases, corresponding to an increase in sparsity. 
Meanwhile, there exists a peak in Xj near j = Lpre that results from synchronization. 

Next, we empirically estimated the switching probabilities p"^ = [p^^ , . . . , p^^_^]^ and p^" using 
maximum a-posteriori (MAP) state estimates, i.e., dj^u = ['^j,u + 0.5J. In particular, 

Pf = J2u=l „=0&d, „=1}/ S«=l „ = 1} <^14) 



'{dj+i,„=0&(ij,„=l}/ ^u=l -^{dj,„ = l} 
= Sn=l l{d;+i,„=l&d,_„=0}/S«=l l{d,,„=0}' ^^^^ 

where 1{^} denotes the indicator function for event A. From the plots in Fig. [3l we see that the estimated 



switching probabilities are lag-dependent as well. 
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Fig. 1. Histograms of 'Re{xj) for lags j G {5, 23, 128, 230}, with "tight" axes. With synchronization delay Lpre = 20, note 
that the histogram appears Gaussian for j<ipre, Laplacian for j~ I/pre, and very sparse for j^ipre- 



Finally, using the MAP state estimates {dj,u}, we empirically estimated the normalized conditional 
correlation 



J2"=i i{dj^i „=i.dj „=i}^j+i,"^j,, 



and found that the magnitudes were < 0.1, validating our assumption of conditionally independent tap 
amplitudes. 

In summary, we see that IEEE 802.15.4a channels do indeed yield taps with the lag-dependent clustered 
sparsity described in Section III-BI Moreover, we have shown how the GM2-HMM parameters can be 
estimated from realizations of x. Next, we propose an efficient factor-graph based approach to joint 
channel-estimation and decoding (JCED) for BICM-OFDM using the GM2-HMM prior proposed in 
Section HTbI 



III. Joint Channel Estimation and Decoding 

Our goal is to infer the information bits b from the OFDM observations y and the pilot/training bits Cpt, 
without knowing the channel state x. In particular, we aim to maximize the posterior pmf p{bm I Cpt) 
of each info bit. To exploit prior knowledge that x is clustered-sparse, we employ the GM2-HMM prior 
described in Section HFB] As a result, the info-bit posterior can be decomposed into the following product 
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100 150 
lag j [baud] 



Fig. 2. A sample realization of channel taps {xj} generated from the IEEE 802.15.4a model with SRRC pulse shaping. Also 
shown is the empirically estimated PDP, best fits of the GM2 parameters 
hidden state dj given the tap value Xj. 



Vj,Vj~\, and the MAP threshold for detecting the 



of factors: 



p{bm\y,c^x) = ^p{h\y,c^x) oc ^ p(y | 6, Cpt)p(b) (16) 
^P{y I s, x)p{x I d)p{d)p{s I c)p{c I b, cpt)p(6) 

Mi 



d i=0 s j=0 

Mi 

X 

c 6-m m=l 

where fo_rn — [^i, • • • > ^m-i> ^m+i> • • ■ , ^A/i]^- This factorization is illustrated by the /acfor graph in 
Fig. m where the round nodes represent random variables and the square nodes represent the factors of 
the posterior exposed in ([TT]) . 

A. Background on belief propagation 

Although exact evaluation of the posteriors {p{hm\y,c^\)} is computationally impractical for the 
problem sizes of interest, these posteriors can be approximately evaluated using belief propagation (BP) 



June 7, 2011 



DRAFT 



11 




uniform info code & piiots & coded symboi QAM OFDIVI cfiannei sparse tap ciuster 
prior bits interiv training bits mapping symbs observ taps prior states prior 




Fig. 4. Factor graph of the JCED problem for a toy example with M\ = 3 information bits, A^p = 1 pilot subcarrier (at 
subcarrier index i = 3), Mt — 2 training bits, Af = 2 bits per QAM symbol, N = 4 OFDM subcarriers, and channel impulse 
response length L = 3. 



||37l on the factor graph in Fig. ID In textbook BP, beUefs take the form of pdfs/pmfs that are propagated 
among nodes of the factor graph via the sum/product algorithm (SPA) |[T8l - |[20l : 

1) Say the factor node / is connected to the variable nodes {va}'^^i. The belief passed from / to Vb 

(vb) oc J[^,^}^^J{vi, ■ ■ ■ ,VA)Ylaj^bPva^f{va), givcn the beliefs {p^^^f{-)}a^b recently 
passed to /. 

2) Say the variable node v is connected to the factor nodes {/i, . . . , /b}- The belief passed from v 
to fa is p^^f^v) OC Wb^aPh-^viv), givcn the beliefs {pft,^v{-)}b^a recently passed to v. 
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3) Say the variable node v is connected to the factor nodes {/i, . . . , Jb}- The posterior on v is the 
product of all recently arriving beUefs, i.e., p{v) oc Y[b=iPfb~>v{v)- 

When the factor graph contains no loops, SPA-BP yields exact posteriors after two rounds of message 
passing (i.e., forward and backward). But, in the presence of loops, convergence to the exact posteriors is 
not guaranteed ||25l . That said, there exist many problems to which loopy BP |[37l has been successfully 
applied, including inference on Markov random fields ll38l . LDPC decoding |[T4l . and compressed sensing 
|[T3l . ifTSl . ll32l . |[39l - ll4TI . Our work not only leverages these past successes, but unites them. 

B. Background on GAMP 

An important sub-problem within our larger bit-inference problem is the estimation of a vector of 
independent possibly-non-Gaussian variables x that are linearly mixed via $ € £^NxL form z = ^x = 
[zq, . . . , zn^i]^, and subsequently observed as noisy measurements y through the possibly non-Gaussian 
pdfs {py^\zX- I ■)}f=o^- ou'" case, Q specifies a GM2 prior on Xj and @ — given the finite-alphabet 
uncertainty in Si — yields the non-Gaussian measurement pdf PY^\Zi- This "linear mixing" sub-problem 
is described by the factor graph shown within the middle dashed box in Fig. |4l where each node 
represents the measurement pdf py.|z. and the node rightward of each node "xf represents the GM2 
prior on Xj. 

Building on recent work on multiuser detection by Guo and Wang H2l . as well as recent work on 
message passing algorithms for compressed sensing by Donoho, Maleki, Montanari, and Bayati POl . 
Pn . Rangan proposed a so-called generalized approximate message passing (GAMP) scheme that, for 
the sub-problem described above, admits rigorous analysij^ as N,L^oo llT3l . The main ideas behind 
GAMP are the following. First, although the beliefs flowing leftward from the nodes {xj } are clearly non- 
Gaussian, the corresponding belief about Zi = XljS} ^ij^j '^^^ accurately approximated as Gaussian, 
when L is large, using the central limit theorem. Moreover, to calculate the parameters of this distribution 
(i.e., its mean and variance), only the mean and variance of each xj are needed. Thus, it suffices to pass 
only means and variances leftward from each Xj node. It is similarly desirable to pass only means and 
variances rightward from each measurement node. Although the exact rightward flowing beliefs would 
be non-Gaussian (due to the non-Gaussian assumption on the measurement channels py^|^.), GAMP 
approximates them as Gaussian using a 2nd-order Taylor series, and passes only the resulting means 

' Since it is difficult to give a concise yet accurate account of GAMP's technical properties, we refer the interested reader to 
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definitions: 












Pyi|Zi(!/l^)C.'^(2;i,i'^) 


(Dl) 


Pz,\Y,{z\y;z,u'') 






J'^, py-. |z. (h|z')CA/'(z';z,i^^) 




= 




(D2) 




= 




(D3) 




— 




(D4) 








(D5) 






ITT L |a; - 9\n,j (f, px^{x; f, v'') 


(D6) 


initialize: 










= 


J^xpx,{x) 


(11) 




= 


Jjx-x,{l)\^px,{x) 


(12) 


Vi : Mi(0) 


= 





(13) 


forn= 1,2,3,... 








Vi : Zi{n) 






(Rl) 


Vi : u!{n) 






(R2) 


Vi : Pi{n) 




ii(n) — !/~(n) iii(n - 1) 


(R3) 


Vi : Ui{n) 






(R4) 


Vi : 




-3out,i(yi.Pi(7i),!/f (n)) 


(R5) 


Vj : u^in) 




(EtoM4>.f-?(n))-^ 


(R6) 


Vj : fj{n) 






(R7) 


Vj : I'jin+l) 






(R8) 


Vj : a;j(n+l) 






(R9) 


end 









TABLE I 
The gamp Algorithm 



and variances. A further simplification employed by GAMP is to approximate the differences among the 
outgoing means/variances of each left node, and the incoming means/variances of each right node, using 
Taylor series. The GAMP algorithrqj is summarized in Table Jl 

C. Joint estimation and decoding using GAMP 

We now detail our application of GAMP to joint channel-estimation and decoding (JCED) under the 
GM2-HMM tap prior, frequently referring to the factor graph in Fig. ID 

To be precise, the GAMP algorithm in Table|I]is an extension of that proposed in II13I . TableUhandles circular complex-valued 
distributions and non-identically distributed signals and measurements. 
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Because our factor graph is loopy, there exists considerable freedom in the message passing schedule. 
Roughly speaking, we choose to pass messages from the left to the right of Fig. |4] and back again, several 
times, stopping as soon as the messages converge. Each of these full cycles of message passing will be 
referred to as a "turbo iteration." However, during a single turbo iteration, there may be multiple iterations 
of message passing between the GAMP and MC sub-graphs, which will be referred to as "equalizer" 
iterations. Furthermore, during a single equalizer iteration, there may be multiple iterations of message 
passing within the GAMP sub-graph, while there is at most one forward-backward iteration within the 
MC sub-graph. Finally, the SISO decoding block may itself be implemented using message passing, in 
which case it may also use several internal iterations. The message passing details are discussed below. 

At the start of the first turbo iteration, there is total uncertainty about the information bits, so that 
Pr{6m = 1} = ^ Thus, the initial bit beliefs flowing rightward out of the coding/interleaving block 
are uniformly distributed. Meanwhile, the pilot/training bits are known with certainty. 

Coded-bit beliefs are then propagated rightward into the symbol mapping nodes. Since the symbol 
mapping is deterministic, the corresponding pdf factors take the form p{s^'^^ |c(')) = 5k-i- The SPA 
dictates that the message passed rightward from symbol mapping node ''M." takes the form 

M 

PM.^sAs^"^) ^ P(S«|C) nPc_^M.(Cm) (18) 

M 

= iip,^^,^^m.{c^J:^^), (19) 

m=l 

which is then copied forward as the message passed rightward from node Si (i.e., pjii.^s,{s^''^) = 

Recall, from Section IIII-BI that the symbol-belief passed rightward into the measurement node "y" 
determines the pdf Py,\z, used in GAMP. Writing this symbol belief as /3j = . . . , /S^-'^'^]^ for 

- Psi^Viis'-'"''), equation © implies the measurement pdf 

|S| 

PY.\zM^) = j;/3f CAA(y;.('=)z;i.-). (20) 

k=l 

From (|20l ). it is shown in Appendix lAl that the quantities in (D2)-(D3) of Table H become 

5out,i(y,^,'^^) = ^ei(y,z,z/'=) (21) 
5ouu(y'^.'^ ) = ^ ; 1 (22) 
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for 



lc{fc)|2,,2 



l<'>fe,i,^') A (i-£)c<"(f') (25) 

|S| 



k=l 



fc=l 



where = • • • > C^'^'^]^ characterizes the posterior pmf on Si under the channel model Zi ~ 
CJ\f{z,u^). Likewise, from it is shown in Appendix IB] that the quantities (D5)-(D6) take the form 

9\n,jir, v') = (aj 7] + (1 - "i) if) r (28) 
5i'n,,(r, v') = a,(l - a,-)(7l " 7°)' l^lV'^' + o^jl] + (1 " "j)7°, (29) 
for 70(1^'") = (1 + z^7i/°)-^ and 7](zy^) = (1 + and 

«j(^^') = z ^ ^ -TT- (30) 



1 + 



1 - \j CM{f] 0, + i^' 



-cf /:f(f,i/'-) 

Above, C^^" is the apriori likelihood ratio p^|j^~Q| on the hidden state, >C^'*'(f, v''') is GAMP's extrinsic 
likelihood ratio, and aj(r, v"^) is the corresponding posterior probability that dj = 1. 

Using (l2Tll-(l30ll, the GAMP algorithm in Table U is iterated until it converges^ In doing so, GAMP 
generates (a close approximation to) both the conditional means x and variances iv^ = [i/q , . . . , 
given the observations y, the soft symbol priors (3= [(3q, . . . ,(3i^_iY and the sparsity prior A. Conve- 
niently, GAMP also returns (close approximations to) both the conditional means z and variances of 
the subchannel gains z, as well as posteriors ^ = [^o; • • • ;^L-i]^ t^e symbols s. 

Before continuing, we discuss some GAMP details that are specific to our OFDM-JCED application. 
First, we notice that, to guarantee that the variance vf{n) in (R5) is positive, we must have i/f < z/^ in 



1 v^L-l 

llltlLC UiilCiCIlL;c 

a threshold or a maximum number of GAMP iterations has elapsed. 



^ More precisely, GAMP is iterated until the mean-square tap-estimate difference -jj Tl'j^o 

(n- 1)1^ falls below 
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(l22l ). Since this is not necessarily the case during the first few GAMP iterations, we clip vf at the value 
0.99z/^, where 0.99 was chosen heuristically. Second, due to unit-modulus property of the DFT elements 
^ij, step (R2) in Table |I] simplifies to Vi{n) = '}2j i^ji^) and (R6) simplifies to i/J(n) = ^■ 
With these simplifications, the complexity of GAMP is dominated by either the matrix-vector products 
J2j ^ij^ji^) i'^ (1^1) ^^'^ J2i ^ij'^ii^) i'^ which can be implemented using a log2 A^-multiply 

FFT when A^ is a power-of-two, or by the calculation of {cj, ff }^q^ in (I26l)-(l27l). which requires 0(A^|S|) 
multiplies. Thus, GAMP requires only 0(A^log2 A^ + A^|S|) multiplies per iteration. 

After the messages within the GAMP sub-graph have converged, tap-state beliefs are passed right- 
ward to the MC sub-graph. In particular, the SPA dictates that GAMP passes tap-state likelihoods or, 
equivalently, the extrinsic likelihood ratios C^^^. Since the MC sub-graph is non-loopy, only one iteration 
of forward-backward message passing is performed]^ after which the resulting tap-state likelihoods are 
passed leftward back to GAMP, where they are treated as tap-state priors A in the next equalizer iteration. 
This interaction between the GAMP and MC sub-blocks can be recognized as an incarnation of the 
structured-sparse reconstruction scheme recently proposed by the authors in (15 L 

When the tap-state hkehhoods passed between GAMP and MC have convergedO the equalizer iterations 
are terminated and messages are passed leftward from the GAMP block. For this, SPA dictates that a 
symbol-belief propagates leftward from the node with the form 



p.^^y^s) ^ CN{y^;sz,v'")CN{z-Zi.vf) (31) 

J z 

= CM{yi;shAs\^vt + ^'"), (32) 

where (zj, uf) play the role of soft channel estimates. The SPA then implies that p^.^s, {s) = Ps.^yt (s). 
Next, beliefs are passed leftward from each symbol-mapping node A4i to the corresponding bit nodes 

* Message passing on the MC factor graph is a standard procedure. For details, we refer the reader to 1141 . 1341 . 
' More precisely, the equalizer iterations are terminated when the mean-square difference in tap-state log-likelihoods falls 
below a threshold or a maximum number of equalizer iterations has elapsed. 
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. From the SPA, they take the form 



ISI 





(33) 




(34) 



for pairs (i, m) that do not correspond to pilot/training bits. (Since the pilot/training bits are known with 
certainty, there is no need to update their pmfs.) 

Finally, messages are passed leftward into the coding/interleaving block. Doing so is equivalent to feed- 
ing extrinsic soft bit estimates to a soft-input/soft-output (SISO) decoder/deinterleaver, which treats them 
as priors. Since SISO decoding is a well-studied topic |[T4l . P3l and high-performance implementations 
are readily available (e.g., f44l ). we will not elaborate on the details here. It suffices to say that, once the 
extrinsic outputs of the SISO decoder have been computed, they are re-interleaved and passed rightward 
from the coding/interleaving block to begin another turbo iteration. These turbo iterations continue until 
either the decoder detects no bit errors, the soft bit estimates have converged, or a maximum number of 
iterations has elapsed. 

IV. Numerical Results 

In this section, we present numerical results that compare JCED using our GAMP-based scheme to 
that using soft-input soft-output (SISO) equaUzers based on Unear MMSE (LMMSE) and LASSO 0, 
as well as to performance bounds based on perfect channel state information (CSI). 



For all results, we used irregular LDPC codes with codeword length ~ 10000 and average column 
weight 3, generated (and decoded) using the publicly available software P4l . with random interleaving. 
We focus on the case of A^ = 1024 subcarrier OFDM with 16-QAM (i.e., M = 4) operating at a spectral 
efficiency of 7? = 2 bpcu. For bit-to-symbol mapping, we used multilevel Gray-mapping 1451 . noting recent 
work 061 that conjectures the optimality of Gray-mapping when BICM is used with a strong code. In 
some simulations, we used A'^p > pilot-only subcarriers and Mt = interspersed training bits, whereas 



A. Setup 
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in others we used A'^p = and Mt > 0. When A'^p > 0, the pilot subcarriers were placed randomly and 
modulated with (known) QAM symbols chosen uniformly at random. When M\ > 0, the training bits 
were placed at the most significant bits (MSBs) of uniformly spaced data-subcarriers and modulated with 
the bit value 1. 

Realizations of the tap vector x[q] were generated from IEEE 802.15.4a outdoor-NLOS impulse 
responses and SRRC pulses, as described in Section III-C[ and not from the GM2-HMM model. The 
tap vectors generated for our simulations are thus as realistic as one can hope to obtain in software. All 
reported results are averaged over 5000 channel realizations (i.e., 10^ info bits). 

The GM2-HMM parameters i^^ , , p^^ , p^^ were fit from 10000 realizations of the tap-vector x 
using the procedure described in Section III-CI In doing so, we implicitly assumecj^ that the receiver is 
designed for the outdoor scenario, and we leverage the prior information made available by the extensive 
measurement campaign conducted for the IEEE 802.15.4a standard |[T6l . In all cases, we used a maximum 
of 10 turbo iterations, 5 equalizer iterations, 15 GAMP iterations, and 25 LDPC decoder iterations, 
although in most cases the iterations converged early (as described in Section IIII-CI) . 

B. Comparison with other schemes 

The proposed GAMP-based equalizer was compared with soft-input soft-output (SISO) equalizers 
based on LMMSE and LASSO 111, whose constructions are now detailed. 

All SISO equalizers are provided with the soft inputs s[q] and v^[q\, i.e., the means and variances, 
respectively, of the symbols s[q] € S^. (Note that, if certain elements in s[q] are known perfectly as 
pilots, then the corresponding elements in v^[q] will be zero-valued.) Then, writing s[q] = s[q] + s[q], 
where s[q] an unknown zero-mean deviation, the subcarrier observations y[q] = 'D{s[q])^x[q] + wlq] 
can be written as 

y[q]=V{s[q])^x[q]+v[ql (35) 

where v[q] = 'D{s[q])^x[q] + w[q] is a zero-mean noise. Treating the elements within s[q] as uncor- 
related and doing the same with x[q], and leveraging the fact that $ is a truncated DFT matrix, it is 
straightforward to show that Cow{v[q]) = ^{u^'lq]) with ^"[q] = v^l + {1^ p)u^[q], where p denotes 

^ If, instead, we knew that the receiver would be used in a different operating scenario, then we could generate representative 
realizations of x for that scenario and fit the GM2-HMM parameters accordingly. Furthermore, one could optimize the receiver for 
any desired balance between "typical" and "worst-case" operating conditions by simply choosing appropriate training realizations 

X. 
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the channel's PDP. Without loss of generality, (1351 ) can then be converted to the equivalent white-noise 
model 

u[q] ^ V{u-[q])-'^y[q] = Ax[q] + n[q], (36) 

where Cov(n[g]) = / and A[q] = 'D{u"[q]^2s[q])^ is a known matrix. In summary, ( [36l l provides a 
mechanism to handle soft inputs for both LASSO and LMMSE. 
For LMMSE equalization, we first used (l36l ) to compute 

i^lmmseM = ViP)A%]{A[q] V{p)A%] + (37) 

from which we obtain the subcarrier gain estimate zimmsel?] = *a;immse[9]- The covariance matrix of 
^ImmseM is 

* (P(p) - P(p) [q] {A[q] P(p) [q] + l)'' A[q] V{p)) 

whose diagonal elements i^frnmsel^] variances on the gain estimates i;immse[9]- Finally, we obtain soft 
symbol estimates from the soft gain estimates (i;|mmse['?]5 ^fmmseb]) <l32l ). 

For LASSojfl we first computed the tap estimate ^lassol?] from ( [36l ) using the celebrated SPGLl 
algorithm | t47T| . In doing so, we needed to specify the target residual variance, i.e., y^^^Q = ;^||'u[q'] — 
A[q]a;iasso[g] ||2- Because Cov(n[g]) = /, we expect the optimal value of z^iaggo to be near 1 and, after 
extensive experimentation, we found that the value J^iasso works well at high SNR and that the value 
i/|gggp = 1.5 works well at low SNR. Thus, for each u[q], we computed SPGLl estimates using each of 
these twc0 targets, and kept the one that minimized the squared error I'lassoM ~ xll^l^] ~ ^lassofe] Hi' 
which we assume a genie is able to provide. For the soft outputs, we set ^lassol?] = *^lasso[Q'] take 
^iassol^] diagonal elements of * Cov(a;iasso[?])*'^- Assuming Cov(£|asso[Q']) = ^lassol^]-^ ^'^'^ 

leveraging the fact that * is a truncated DFT matrix, we find I'lassob] ~ ^^lassol^]-'-- Finally, using ( [32l ). 
we obtain soft symbol estimates from the soft gain estimates (2iasso['7]i '^fassoM)- genie-aided 
steps, the performance attained by our LASSO implementation is better than what could be obtained in 
practice. 

These LMMSE- and LASSO-based SISO equalizers were then embedded in the overall factor graph 
in the same manner as GAMP, with the following exceptions: 1) The LMMSE and LASSO algorithms 
could not be connected to the MC sub-block, since they are not based on a two-state mixture model; 

' The criterion employed by LASSO jpl is equivalent to the one employed in "basis pursuit denoising" (10|. 

We also tried running SPGLl for a dense grid of V\^^so values, but often it would get "stuck" at one of them and eventually 
return an error. 
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2) For LASSO, if the genie-aided MSB i^iassol^] '^^'^ improve during a given turbo iteration, then 
the corresponding outputs (i:iasso[9]) ^^i^assoM) were not updated. This rule was employed to prevent 
turbo-LASSO from occasionally diverging at low SNR; 3) For LASSO, if A^p > and Mt = 0, then 
the LASSO estimates computed during the first turbo iteration use only pilot subcarriers. This makes 
the performance of SISO-LASSO after the first turbo iteration equal to the performance of the standard 
pilot-aided LASSO. 

C. BER versus the number of pilot subcarriers Np 

Figure [5] shows bit error rate (BER) versus the number of pilot subcarriers A'^p at £'5/^^0 = 1! dB and 
a fixed spectral efficiency of 77 = 2 bpcu. In this and other figures, "ALG-#" refers to algorithm ALG with 
# turbo iterations (and "ALG-fin" after turbo convergence; see Fig. [TOl ) with the MC block disconnected 
(i.e., there was no attempt to exploit tap clustering). Meanwhile "GAMP-# MG-5" refers to GAMP-i-MC 
after # turbo iterations, each containing 5 equalizer iterations. Finally, PCSI refers to MAP equalization 
under perfect CSL which yields a bound on the BER performance of any equalizer. 

The curves in Fig. [5] exhibit a "U" shape because, as Np increases, the code rate R must decrease 
to maintain the fixed spectral efficiency rj = 2 bpcu. While an increase in A'^p generally makes channel 
estimation easier, the reduction in R makes data decoding more difficult. For all schemes under com- 
parison. Fig. |5] suggests that the choice Np ?a 224 is optimal under the operating conditions. Overall, we 
see GAMP significantly outperforming both LMMSE and LASSO. Moreover, we see a small but definite 
gain from the MC block. 

D. BER versus the number of interspersed training bits Mt 

Although A^p > pilot subcarriers are required for decoupled channel estimation and decoding, JCED 
can function with A^p = as long as a sufficient number Mt of training bits are interspersed among the 
coded bits used to construct each QAM symbol. To examine this latter case. Fig. [6] shows BER versus 
Mt at Ei)/No = 10 dB, a fixed spectral efficiency of r/ = 2 bpcu, and A^p = 0. Again we see the "U" 
shape, but with GAMP working very well for a relatively wide range of Mt, and again we see a small 
but noticeable BER improvement when the MC block is used. SISO-LASSO seems to work to some 
degree with A^p = 0, but SISO-LASSO does not. 

E. BER versus E^/No 

Figure |7] shows BER versus Ei,/No using A^p = 224 pilot subcarriers (as suggested by Fig. [5]l and 
Mt = training bits. Relative to the perfect-CSI bound, we see SISO-LASSO performing within 5 dB 
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Fig. 5. BER versus number of pilot subcarriers Np, for Ei,/No = 11 dB, Mt = training bits, rj = 2 bpcu, and 16-QAM. 



during the first turbo iteration and within 4.5 dB after convergence. Meanwhile, we see SISO-LMMSE 
performing very poorly during the first turbo iteration, but eventually surpassing SISO-LASSO and coming 
within 4 dB from the perfect-CSI bound. Remarkably, we see GAMP+MC performing within 0.6 dB 
of the perfect-CSI bound (and within 1 dB after only 2 turbo iterations). This excellent performance 
confirms that the proposed GM2-HMM channel model and equalizer design together do an excellent job 
of capturing and exploiting the lag-dependent clustered-sparse characteristics of the 802.15.4a channel 
taps. Comparing the GAMP traces to the GAMPh-MC traces, we see that the MC block yields a small 
but noticeable benefit. 

Figure [8] shows BER versus Eh/No using Mt = 448 interspersed training bits (as suggested by Fig. |6ll 
and A'^p = pilot subcarriers. There we see that SISO-LASSO does not perform well at all. SISO- 
LMMSE works to some degree after several turbo iterations, although not as well as in the A'^p > 
case. Meanwhile, we see GAMP-i-MC performing within 1 dB of the perfect-CSI case, and GAMP alone 
performing within 1.5 dB. Comparing Fig. [8] to Fig. |2l we see GAMP with training bits performing 
about 1 dB better than GAMP with dedicated pilot subcarriers. The perfect-CSI bound likewise improves 
because, with 16-QAM, Mt = 448 training bits constitutes half the overhead of A'^p = 224 pilot subcarriers, 
allowing Fig. [8] the use of a stronger code at r] = 2 bpcu. 
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Mt 

Fig. 6. BER versus number of interspersed training bits Alt, for E^/No = 10 dB, A'^p^O pilots subcarriers, rj = 2 bpcu, and 
16-QAM. 

F. Channel-tap NMSE versus Eb/No 

Figure |9] shows the channel estimates' normalized mean-squared error (NMSE) E{||a;[(7]— |||} 
versus Eb/Ng, at the point that the turbo iterations were terminated, using A'^p = 224 pilot subcarriers 
and Mt = training bits. (For comparison, Fig. |7] shows BER for this configuration.) We also show the 
NMSE attained by the "bit and support genie" (BSG), which calculates MMSE channel estimates using 
perfect knowledge of both the coded bits and the hidden channel states {dj}, and which provides a lower 
bound for any channel estimator. In the figure, we see that the NMSEs of LMMSE and LASSO channel 
estimates are within 8-to-12 dB of the BSG, whereas those of GAMP are within 2-to-4 dB. Meanwhile, 
we see that GAMP+MC has a small but noticeable advantage over GAMP alone. We reason that the 
LMMSE estimates are worse than the GAMP estimates because they do not exploit the non-Gaussianity 
of the channel taps Xj, and the LASSO estimates are worse than the GAMP estimates because they do 
not exploit the known priors on the channel taps (i.e., the lag-dependent sparsity A and PDP p). 

G. Computational complexity versus E^/Nq 

Figure [10] shows the average time per turbo iteration (in Matlab seconds on a 2.6GHz CPU), the 
average number of turbo iterations, and the average total time (to turbo convergence), as a function of 



June 7, 2011 



DRAFT 



23 



10° 




Eb/No [dB] 

Fig. 7. BER versus Eb/No, for A'p = 224 pilot subcarriers, M\ — i} training bits, ■q = 2 bpcu, and 16-QAM. 




Eb/No [dB] 

Fig. 8. BER versus Eb/No, for A'p^O pilot subcarriers, Mt = 448 training bits, r; = 2 bpcu, and 16-QAM. 



Eb/No, using A''p = 224 pilot subcarriers and Mt = training bits. (For comparison, Fig. |7] shows BER 
for this configuration and Fig. |9] shows NMSE.) Regarding the average time per turbo iteration, we see 
GAMPzbMC taking sa 1.5 sec at low Eb/No and ?a 0.5 sec at high Eb/No. GAMP+MC takes only sUghtly 
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Eb/No [dB] 

Fig. 9. Channel tap NMSE versus Et/No, for iVp = 224 pilot subcarriers, A/t = training bits, 77 = 2 bpcu, and 16-QAM. 

longer than GAMP alone due to the efficiency of the message computations within the MC block, and 
the fact that both the GAMP iterations and equalizer iterations are terminated as soon as the messages 
converge. In comparison, SISO-LMMSE takes 4.5 sec per turbo iteration, and SISO-LASSO takes 
between 1 and 7 sec, depending on E{,/No. Regarding the number of average number of turbo iterations 
until convergence, we see that — at low Eb/No — GAMP+MC takes about 5 turbo iterations, GAMP alone 
takes about 7, SISO-LMMSE takes about 5, and SISO-LASSO takes about 3, while— at high £^;,/iVo— all 
algorithms converge after only 1 turbo iteration. Regarding the total time for equalization, GAMP-i-MC 
and GAMP are about the same at low Eh/ No, whereas GAMP alone takes about 30% less time at high 
Eb/No. Meanwhile, SISO-LASSO and SISO-LMMSE are uniformly slower than GAMP and GAMP-hMC 
over the entire E^/No range, in some cases by a factor of 10. 

V. Conclusion 

In this paper, we presented a factor-graph approach to joint channel-estimation and decoding (JCED) 
for BICM-OFDM that merges recent advances in approximate message passing algorithms (13] with 
those in structured-sparse signal reconstruction lITSi and SISO decoding |[T4l . Different from existing 
factor-graph approaches to JCED, ours is able to exploit not only sparse channel taps, but also clustered 
sparsity patterns that typify large-bandwidth communication channels, such as those that result from 
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Fig. 10. Average time per turbo iteration (top), average number of turbo iterations (middle), and average total time (bottom), 
versus Eb/No, for A^p = 224 pilot subcarriers, Mt = training bits, rj — 2 bpcu, and 16-QAM. 



pulse-shaped communication over IEEE 802.15.4a modeled channels. For this purpose, we proposed 
the use of a two-state Gaussian mixture prior with a Markov model on the hidden tap states. The 
implementation complexity of our JCED scheme is dominated by 0{N log2 N + N\§\) multiplies per 
GAMP iteration, facilitating the application to systems with many subcarriers and many channel taps 
L < N. Experiments with IEEE 802.15.4a modeled channels showed BER performance within 1 dB of 
the known-channel bound, and 3-4 dB better than LMMSE- and LASSO-based soft equalizers. These 
experiments also suggested that, with our proposed approach, the use of interspersed training bits is more 
efficient than the use of dedicated pilot subcarriers. For very large constellations (e.g., |S| = 1024), future 
work is motivated to reduce the linear complexity dependence on |S|. 

Appendix A 

Derivation of GAMP Functions ^rout,* and 5^^, ■ 

In this appendix, we derive the GAMP quantities gou\,i{y, and g'^^^ ^{y , z , u^) given in (|2TI)-(|26]). 

From (Dl), we have that 

PY, [y) Jz 
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where PyXv) — /^Py.jz, (y|-2^)CA/'(2:; z, zy^). From (l20l ). we rewrite PYi\Zi{y\z) as 

^>'=I^.(^I^)=E3^CAA(.;-|^,^), (39) 
fe=i ' ' 

so that 

/ .Py.|z.(yk)CAA(z;z,.^~) = / .CAr(z;-|^,-^)cAr(.;i,.^) (40) 

fc=i ^ 

Using the property that 

^ , , )cA/-(0;g^-^,.^ + .^), (42) 

we can rewrite 

^Py,|z,(y|2)C7V(z;z,i^^) 

fc=l ' ' ^ jy™ i/^ u"' ~'~ I/' 

fc=l ' ' J/™ 



A;=l 



and, using the same procedure, we get 

2 

fc=l 

,(fe) 



fc=i 

With erny, 

z, u^) defined in (l23l) . equations (1381 ) and (1451 ) and (l46l ) combine to give 

Ez.|y.{^|y;^,i^n = (y,i,z^^~)(eW(y,z,i/^) + z). (47) 



fe=i 

Finally, from ( [47] ) and the definition of goui,i{y, z,^^) in (D2), equation (|2TI ) follows immediately. 
From (Dl), we have that 



varz^|yjz|y;z,i/^} = — ^ ^ |z - E^^|yjz | y; £, py^l^^ (yja:) C7\A(z; i, i^""). (48) 
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Similar to (1431) . we can write 

fc=i 



y wn' _ 

(49) 

Then, using the change-of- variable z = z — E^^|y.{z \y; z,!^^}, and absorbing the s^'^^ terms as done in 
(|45] ). we get 



^ k - Ez,|y {z I y, z, y^}\^PY,\z, {y\z) CM{z\ i, i/^ 

2A-f 

= J3/3<"CA/-(!/.;s(*'l, |»<'>|V' + ^«') 



k=l 

X I \~z\'eM[z-e^') + z - E,^^y^{z\y;z,.^} , ^^^^^^^^^^ ^ (50) 

= -ei 

k=l 

Using ^l''\y, z,ij'') defined in (l23]l and (('^^(y, z, i/^) defined in (O, equations (061) and ([48]) and (ISTl) 
combine to give 

fc=i ''^ ' 

which is rewritten as vf{y,z,v^) = var^.|y^{2; | y; z, z^^} in (l27l) . Finally, plugging uf{y,z,i'^) into the 
definition of g'^^J^ -{y , z , u^) in (D3), we immediately obtain 



Appendix B 
Derivation of GAMP Functions g\nj and g'^ ^• 

In this appendix, we derive the GAMP quantities g\nj{f,v'^) and g[^-{f,v''') given in (|28])-(l30b. 
From (D4)-(D6), we note that g\nj{f,u^) and u''' g[^ ■{r^v'^) are the mean and variance, respectively, 
of the pdf 

^px,{r)CN{r-f,v'), (53) 
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where Zj = j'^px^ir) CM {r;f .v"^). Using (l42l) together with the definition of px^i-) from (O, we find 

px^{r)CM{r;f,v'') 

= XjCMir; 0, u})CM{r; r, v'') + (1 - \j)CM{r; 0, CA/'(r; f, u'') (54) 
= AjCAA(f; 0, v] + z."^) CAr(r; f7j(z.'^), z.'^7j(z.'^)) 

+ (1 - \j)CM{f- 0, + u"-) CM{r- r7°(i^0, i^'7°(i^")) (55) 
for 7?(i/'') = (1 + i/Vf°)"i and 7j(i/'^) = (1 + 1^7^'])"^ This implies that 

Z,j = \fM{r] 0, v] + i/^) + (1 - Aj)CAA(r; 0, + z/''). (56) 

Thus, the mean obeys 

5in,j(^,i^'") = ^ [ rpx^{r)CJ\f{r;f,u'') (57) 



2~ 7j('^)?' + ^ rj{T^)r, (58) 



= aj{f,u'') =1- aj{f,v'') 

yielding (1281 ). where a straightforward manipulation relates the expression for aj{r,v^') above with its 
definition in (l30l ). 

Since, for the pdf in (1531 ). g\r,j is the mean and I'^glf^ j is the variance, we can write 

'^''akjir^'^l = ^^kl'px,(r)CAA(r;f,r.7 - |gin,,f (59) 
= aj{\n}\^ + i/^j) + (1 - a,)(l^7°l' + z^'7°) " |aj7l^ + (1 " ajhjrf, (60) 
which can be simplified to yield 
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